Scaling-Up Bayesian Network Learning to Thousands of Variables Using Local Learning Techniques
نویسندگان
چکیده
State-of-the-art Bayesian Network learning algorithms do not scale to more than a few hundred variables; thus, they fall far short from addressing the challenges posed by the large datasets in biomedical informatics (e.g., gene expression, proteomics, or text-categorization data). In this paper, we present a BN learning algorithm, called the Max-Min Bayesian Network learning (MMBN) algorithm that can induce networks with tens of thousands of variables, or alternatively, can selectively reconstruct regions of interest if time does not permit full reconstruction. MMBN is based on a local algorithm that returns targeted areas of the network and on putting these pieces together. On a small dataset MMBN outperforms other state-of-the-art methods. Subsequently, its scalability is demonstrated by fully reconstructing from data a Bayesian Network with 10,000 variables using ordinary PC hardware. The novel algorithm pushes the envelope of Bayesian Network learning (an NP-complete problem) by about two orders of magnitude.
منابع مشابه
Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables
We present a method for learning treewidth-bounded Bayesian networks from data sets containing thousands of variables. Bounding the treewidth of a Bayesian network greatly reduces the complexity of inferences. Yet, being a global property of the graph, it considerably increases the difficulty of the learning process. Our novel algorithm accomplishes this task, scaling both to large domains and ...
متن کاملLearning Bayesian Network Structure using Markov Blanket in K2 Algorithm
A Bayesian network is a graphical model that represents a set of random variables and their causal relationship via a Directed Acyclic Graph (DAG). There are basically two methods used for learning Bayesian network: parameter-learning and structure-learning. One of the most effective structure-learning methods is K2 algorithm. Because the performance of the K2 algorithm depends on node...
متن کاملLearning Bayesian Network Structure Using Genetic Algorithm with Consideration of the Node Ordering via Principal Component Analysis
‎The most challenging task in dealing with Bayesian networks is learning their structure‎. ‎Two classical approaches are often used for learning Bayesian network structure;‎ ‎Constraint-Based method and Score-and-Search-Based one‎. ‎But neither the first nor the second one are completely satisfactory‎. ‎Therefore the heuristic search such as Genetic Alg...
متن کاملAn Introduction to Inference and Learning in Bayesian Networks
Bayesian networks (BNs) are modern tools for modeling phenomena in dynamic and static systems and are used in different subjects such as disease diagnosis, weather forecasting, decision making and clustering. A BN is a graphical-probabilistic model which represents causal relations among random variables and consists of a directed acyclic graph and a set of conditional probabilities. Structure...
متن کاملLearning Bounded Treewidth Bayesian Networks with Thousands of Variables
We present a method for learning treewidthbounded Bayesian networks from data sets containing thousands of variables. Bounding the treewidth of a Bayesian greatly reduces the complexity of inferences. Yet, being a global property of the graph, it considerably increases the difficulty of the learning process. We propose a novel algorithm for this task, able to scale to large domains and large tr...
متن کامل